Dyna(k): A Multi-Step Dyna Planning

نویسندگان

Hengshuai Yao

Shalabh Bhatnagar

چکیده

Dyna planning is an efficient way of learning from real and imaginary experience. Existing tabular and linear Dyna algorithms are single-step, because an “imaginary” feature is predicted only one step into the future. In this paper, we introduce a multi-step Dyna planning that predicts more steps into the future. Multi-step Dyna is able to figure out a sequence of multi-step results when a real instance happens, given that the instance itself, or a similar experience has been imagined (i.e., simulated from the model) and planned. Our multi-step Dyna is based on a multi-step model, which we call the λ-model. The λ-model interpolates between the onestep model and an infinite-step model, and can be learned efficiently online. The multistep Dyna algorithm, Dyna(k), uses the λmodel to generate predictions k steps ahead of the imagined feature, and applies TD on this imaginary multi-step transitioning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-step Linear Dyna-style Planning

In this paper we introduce a multi-step linear Dyna-style planning algorithm. The key element of the multi-step linear Dyna is a multi-step linear model that enables multi-step projection of a sampled feature and multi-step planning based on the simulated multi-step transition experience. We propose two multi-step linear models. The first iterates the one-step linear model, but is generally com...

متن کامل

A Multiagent Variant of Dyna-Q

This paper describes a multiagent variant of Dyna-Q called M-Dyna-Q. Dyna-Q is an integrated single-agent framework for planning, reacting, and learning. Like DynaQ, M-Dyna-Q employs two key ideas: learning results can serve as a valuable input for both planning and reacting, and results of planning and reacting can serve as a valuable input to learning. M-Dyna-Q extends Dyna-Q in that planning...

متن کامل

An Architectural Framework for Integrated Multiagent Planning, Reacting, and Learning

Dyna is a single-agent architectural framework that integrates learning, planning, and reacting. Well known instantiations of Dyna are Dyna-AC and Dyna-Q. Here a multiagent extension of Dyna-Q is presented. This extension, called M-Dyna-Q, constitutes a novel coordination framework that bridges the gap between plan-based and reactive coordination in multiagent systems. The paper summarizes the ...

متن کامل

Reinforcement Learning with a Hierarchy of Abstract Models

Reinforcement learning (RL) algorithms have traditionally been thought of as trial and error learning methods that use actual control experience to incrementally improve a control policy. Sutton's DYNA architecture demonstrated that RL algorithms can work as well using simulated experience from an environment model, and that the resulting computation was similar to doing one-step lookahead plan...

متن کامل

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming

This paper extends previous work with Dyna a class of architectures for intelligent systems based on approximating dynamic program ming methods Dyna architectures integrate trial and error reinforcement learning and execution time planning into a single process operating alternately on the world and on a learned model of the world In this paper I present and show results for two Dyna archi tect...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Dyna(k): A Multi-Step Dyna Planning

نویسندگان

چکیده

منابع مشابه

Multi-step Linear Dyna-style Planning

A Multiagent Variant of Dyna-Q

An Architectural Framework for Integrated Multiagent Planning, Reacting, and Learning

Reinforcement Learning with a Hierarchy of Abstract Models

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming

عنوان ژورنال:

اشتراک گذاری